concentration (of one's attention)
Adaptation in English
The title noriega is the name of a politician. Kind of like "Kennedy."
When the probability of a word appearing is p, we tend to think that the probability of it appearing twice is p^2
But in reality, once a word appears, it appears with high frequency.
How high a frequency is this?
Surprisingly, the conditional probability $ Pr(k\ge2|k\ge1) of appearing k+1 times with k occurrences is,
Not depending on the probability of one occurrence $ Pr(k\ge1).
$ Pr(k\ge1) = DF_1/D
$ Pr(k\ge2|k\ge1) = (DF_2/D)/Pr(k\ge1) = DF_2/DF_1
https://gyazo.com/b7dc9d57f2789bf54c540242d6543f29
Comparing words with the same probability of occurrence, Kennedy, for example, shows a higher concentration of occurrence than except.
https://gyazo.com/d42ac8acd36b57adc7319d3aa6f03992
Based on what we're talking about here, it seems like that kind of distribution for word distribution alone, and for keywords, it's on the upper end of the scale. I would like to compare the distribution of words with the distribution of keywords.
Conversely, for any given string, DF2/ DF is an indicator of "keywordiness" independent of its frequency of occurrence
---
This page is auto-translated from /nishio/出現集中 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.